Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | ir | 26 | pat |
2 | – | 27 | dar |
3 | kad | 28 | jų |
4 | į | 29 | labai |
5 | su | 30 | iki |
6 | yra | 31 | būti |
7 | iš | 32 | metu |
8 | tai | 33 | už |
9 | buvo | 34 | po |
10 | kaip | 35 | kai |
11 | ar | 36 | turi |
12 | savo | 37 | metų |
13 | tik | 38 | dėl |
14 | apie | 39 | to |
15 | o | 40 | jo |
16 | ne | 41 | bus |
17 | nuo | 42 | prie |
18 | bei | 43 | arba |
19 | bet | 44 | jis |
20 | Lietuvos | 45 | galima |
21 | m. | 46 | kas |
22 | taip | 47 | mūsų |
23 | gali | 48 | d. |
24 | per | 49 | nes |
25 | jau | 50 | daugiau |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges